Introduzione alla programmazione con Triton: Passaggio dai thread alle istanze di programma

In Triton, l'unità fondamentale di esecuzione passa dal thread scalare CUDA all' istanza di programma. Questo rappresenta un'astrazione di un blocco di thread GPU, dove un'istanza singola gestisce contemporaneamente un blocco vettorizzato di elementi.

1. L'identità dell'istanza di programma

Ogni unità di esecuzione recupera la propria identità tramite pid = tl.program_id(asse=0). Pensa a un carrello elevatore in magazzino (l'istanza di programma) che solleva un pallet (il blocco) di 128 scatole, rispetto a un singolo operatore (thread CUDA) che solleva una sola scatola.

2. Triton vs. tensori di PyTorch

Comprendere il divario semantico è fondamentale per la gestione della memoria:

Tensor di PyTorch: Un oggetto Python lato host che racchiude lo spazio di memorizzazione VRAM, i passi e i metadati.
Tensor di Triton: Un oggetto a livello di compilatore che rappresenta valori o puntatori residenti in registri o SRAM.

Visualizzazione di PyTorch
Oggetto Python che punta a memoria globale contigua.

Visualizzazione di Triton
Un blocco 2D/1D di dati all'interno dei registri del compilatore.

3. Natura SPMD

Triton segue un modello Programma singolo, dati multipli (SPMD) di flusso. Ogni istanza di programma esegue il esatto stesso codice. La divergenza si verifica solo quando la logica utilizza il pid per calcolare offset di memoria specifici.

TERMINALbash — 80x24

> Ready. Click "Run" to execute.

QUESTION 1

What is the primary identifier for a Triton execution unit?

threadIdx.x

tl.program_id(axis=0)

tl.block_idx()

torch.get_id()

QUESTION 2

True or False: A Triton tensor is a Python object that stores metadata like strides on the host CPU.

True

False

QUESTION 3

What is the result of 'forgetting that all program instances execute the same kernel body'?

The compiler will automatically distribute tasks.

Race conditions or overwriting memory if pid-based logic is missing.

The kernel will fail to compile due to a syntax error.

Execution time will double.

QUESTION 4

In the forklift analogy, what does the 'Aisle Number' represent?

The BLOCK_SIZE

The program_id (pid)

The GPU Driver version

The Pointer address

QUESTION 5

Why is the Triton model considered 'Vectorized' compared to CUDA?

It uses Python lists.

One Program Instance handles a block of elements, not just one scalar element.

It only works with 2D matrices.

It runs on the CPU's SIMD units.